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length and alphabet size simultaneously grow to infinity ' 

Jean-Christophe Breton"'" and Christian Houdre^ 



Abstract 

Given a random word of size n whose letters are drawn independently from an ordered 
alphabet of size m, the fluctuations of the shape of the random RSK Young tableaux are in- 
vestigated, when n and m converge together to infinity. If m does not grow too fast and if the 
draws are uniform, then the limiting shape is the same as the limiting spectrum of the GUE. 
In the non-uniform case, a control of both highest probabilities will ensure the convergence 
of the first row of the tableau toward the Tracy-Widom distribution. 

Keywords: GUE; longest increasing subsequence; random words; strong approximation; 
Tracy-Widom distribution; Young tableaux. 

AMS 2000 Subject Classication. Primary: 60F05. Secondary: 60B12, 60C05, 60F15. 

1 Introduction and results 

Let Am — {oLi < a2 < ■ ■ ■ < Qm} be an ordered alphabet of size m and let a word be made 
of the random letters X™, . . . ,X™ (independently) drawn from Am- Recall that the Robinson- 
Schensted-Knuth (RSK) correspondence associates to a (random) word a pair of (random) Young 
tableaux of the same shape, having at most m rows (see, e.g., |Fu| or [St])- R is then well known 
that the length, Vi(n, m), of the top row of these tableaux coincides with the length of the longest 
(weakly) increasing subsequence of The behavior of Vi{n,m) when n and/or m 

go to -|-oo and its connections to various areas of mathematics {e.g., random matrices, queueing 
theory, percolation theory) have been investigated in numerous papers ( |BDJ| . |BS] . |BMj . |GWj . 
|ITW1| ■ |ITW2| . |Jo| . |TW3| . ...). For instance, appropriately renormalized and for uniform 
draws, Vi{n,m) converges in law, as n goes to infinity and m is fixed, to the largest eigenvalue of 
an m X m matrix from the traceless Gaussian Unitary Ensemble (GUE). More generally (see |Jo|l. 
when n — >■ -l-oo (and m is fixed), the shape of the whole Young tableaux associated to a uniform 
random word converges, after renormalization, to the law of the spectrum of an m x m traceless 
GUE matrix. For different random words, such as non-uniform or Markovian ones, the situation 
is more involved r |ITWl| . |ITW2| . [HL3| . [HX^, [CG]). 

For independently and uniformly drawn random words, the following result holds, where, below 
and in the sequel, stands for convergence in distribution. 
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Theorem 1 Let Vk (n, m) = X]i=i sum of the lengths of the first k rows of the Young 
tableau. Then, 

V"- / l<fc<m , , / 

" " \ / l<k<m 

where (_B^, . . . , _B™) is a multidimensional Brownian motion with covariance matrix having dia- 
gonal terms equal to 1 and off-diagonal terms equal to — 1/(to — 1), and where Ik^m 'is defined 
by 

h^m = {t = {tjj : 1 < j <k,0 <l <m) : = 0,tj^rn-k+] = 1, 1 < j < fc, 

tjd~i < tj^i, 1 < J < fc, 1 < / < TO - 1; tjA < tj-i,;,2 < j < fc, 1 < / < TO - l}. 

Here, and in the sequel, the rows beyond the height of the tableau are considered to be of length 
zero. If we let 0^ : M.^ — >■ R'^ be defined via (©fc(x))j — J2i=i^i' ^ ^ j ^ k, then the shape 
of the Young tableau is given by 0~^((Vi(n, to), . . . , Vm{n, m)y) — (i?^, . . . , R^Y ■ Moreover, let 
i/^^jUE m' ^GUE m' ■ • ■ ' ^GUE m) spectruin. Written in non- increasing order, of an to x to 

traceless element of the GUE, where the GUE is equipped with the measure 



l<?<j<m J — ^ 



< m 



and C ,n = (27r)'"/2njli j! (see (Mi)- An important fact (see [Ba], [BJ], [Do], |GTW| . [HL3], 
|OCY| ) asserts that 

y J k m-k+j 

~ \'^GUE,rm "^GUE.m^ • ■ • ' '^Gt/E.mj- {■^J 

In fact, if (Agfy^ m' ■ • ■ ' ''^gc/b m) (ordered) spectrum of an to x to element of the 

GUE, then 

(-^Gt/B.m' ^GUE,my • ■ • ' '^GC/B.m) = ('^GC/B^m' •^G(7E,m' ' • ■ ' '^Gt/B.m) + ^mem, (3) 

where Z,„ is a centered Gaussian random variable with variance 1/m, independent of the vector 
{^GUE,m^ ^GUE,m^ ■ • ■ ' '^Gt/E,™)' ^^^^ where = (1,1,..., 1); see [HXJ for simple proofs of ^ 
and ([3|. 

Finally, recall that, as to — +00, the asymptotic behavior of the spectrum of the GUE has 
been obtained by Tracy and Widom (see [TWlj . ^TW2j and also Theorem 1.4 in ^JoJ, with a slight 
change of the notation): 

Theorem 2 For each r > 1, there is a distribution Fr on M'' such that 

(m^^'^{X'^UE,m-^V^)) =>F„ TO ^+00. (4) 

Remark 3 The distribution is explicitly known (see (3.48) in fJo]) and its first marginal 
coincides with the Tracy- Widom distribution. 

Since Z^to^/^ as to ^ +00, taking successively the limits in n and then in m, llj-lll 
entail, for each r > 1, that 

lim lim [ — L -1 ^ x to,^/^ =¥r@-^. (5) 

m— S-+00 n— >+oo y y'n / l<k<r 
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Following universality argument in percolation models developed by Bodineau and Martin f |BMj ). 
we show below that the limits in n and m in (O can be explicitly taken simultaneously when the 
size m of the alphabet does not grow too fast with respect to n. Doing so, we are dealing with 
growing ordered alphabets and at each step, the n letters X^, 1 < « < are redrawn (and not 
just the nth letter as in the case with the model studied in [HL2]). In a way, we are thus giving the 
fluctuations of the shape of the Young tableau of a random word when the alphabets are growing 
and are reshuffled. In the sequel, m will be a function m{n) of n. However, in order to simplify 
the notation, we shall still write m instead of m{n). A main result of this note is the following: 

Theorem 4 Let m tend to infinity as n — > +00 in such a way that m = o{n^^^^{\ogn)~'^/^). 
Then, for each r > 1, 



Vfc(n, m) — kn/ni — 2k^/n 

„l/2™-2/3 



l<fc<r 



fc(m) ^ o=to<ti<... ^ 



Remark O below, briefly discusses the growth conditions on to. Since, again, the length of the 
first row of the Young tableau is the length Vi(n,TO) of the longest increasing subsequence and 
since the first marginal of F,. is the Tracy- Widom distribution Ftw ■ we have the following result: 

Corollary 5 Let to tend to infinity as n — > +00 in such a way that to — o(n'^/^''(logn)^'^/^). 
Then 

V\ in, to) — (n/m) — 2n^/^ ^ 

When the independent random letters are no longer uniformly drawn, a similar asymptotic 
behavior continues to hold for Vi (n, to) as explained next. Let the X™-, 1 < i < n,he independently 
and identically distributed with F{X™ — af) = p™, let p^^^^ = maxi<j<rn Pj^ an d also let J(to) — 
{j ■ P]"' — Pmax} = {ii; ■ • ■ 7 jfc(m)} with fc(TO) = Card {J{m)). Now, from |HL1| and as n -> +00, 
the behavior of the first row of the Young tableau in this non-uniform setting is given by 

k{m) k{m) 

B^il) + max Yl (^'(^') - B'{ti-i)), 

y)—tQ<~ti<.... 

(6) 

where (B^ , . . . , ij'^('")) is a standard fc(TO)-dimensional Brownian motion. For the limiting behavior 
in TO of the right-hand side of (jH]), as explained next, two cases can arise, depending on the number 
of most probable letters in Am- Setting 

Zk^rY^B^l) and Dk= max V - S'(t,_i)), 

j=i <tfc_i<tfc=i '=1 

and combining ^ and (|H), as well as Remark [31 when k = 1, and since, clearly, Zk 
Af{0, 1/fc), we have 

k^/^{Dk-2y/k) ^ Ftw, k ^ +00. (7) 

First, let k{m) be bounded. Eventually extracting a subsequence, we can assume that /c(to) is 
equal to a fixed k £ N \ {0} and since p^ax S [0, 1], we can also assume that p^ax ~^ Pmax- In 
this case, taking the limit first in n and next in to yields 

Viin,m)-p';;^^^n 



VP". 



(v/l - kprnax - l)^fe + Dk- (8) 



The limiting distribution on the right-hand side of ([5]) depends on fc. For instance, for fc = 1, 
we recover a Gaussian distribution, while for fc > 1 and specific choice of the p^ax which 
Imim^+oo Pmax ~ Oj ^6 recover ([5]) without the Gaussian term. Thus, in general, when fc(TO) is 
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bounded, there is no global asymptotics, but only convergence (to different distributions) along 
subsequences. 

Next, let /c(to) — > +00. In this case, in ([5]), the Gaussian contribution is negligible. Indeed, 
since (-^/l — k{m)p''.^^^ — l)^fc(m)^^/'^ < {k{m)p^^^^)'^k{m)~'^/^ < fc(m)^^/'^ — > 0, when m — +00, 

( - k{m)plZa. - l)Zfe(„)/c(™)i/'' ^ AA(0, ( - - lfk{m)-^/') ^ 0. 

Hence, plugging the convergence result ([7]) into © leads to 



fc(m) / ^ i^Tiv, (9) 

where the limit is first taken as n — > +00 and then as to — >■ +00. In this non- uniform setting, 
we have the following counterpart to Corollary [5] with an additional control on the second largest 
probability for the letters of Am- More precisely, let p^^d — niax(p™ < p^ax ■ ^ 1!^ j ^ "f^)- 

Theorem 6 Let the size m of the alphabets vary with n and assume that k{m{n)), the number 
of most probable letters in Am, goes to infinity when n — >■ +00, in such a way that k{m{n)) — 
o(n'^/"'^'^(logn)~^/^). Assume, moreover, that 

, , „ii/io 

\P2nd ) (log„)l/5 -^\Pmax)- [^^) 

Then 

Vi{n,m{n)) -pm^aJn - 2\J k{m{n))pm^ax 



\J k{m{n))pmax n 



'-k{m{n)f'^ Ftw- (11) 



Let us again stress the fact that in the previous result, m is a function of n, with the only 
requirement beeing that k{m{n)) = o(n'^/^°(log n)~^/^). Note that in the uniform case, fc(TO) = to 
and Pmax = l/m and that, in general, 1/to < Pmax ^ l/fc(TO). 

Let us now put our results in context, relate them to the current literature and also describe 
the main steps in the arguments developed below. 

Bodineau and Martin pMj showed that the fluctuations of the last-passage directed percolation 
model with Gaussian i.i.d. weights actually extend to i.i.d. weights with finite (2 + r)th moment, 
r > 0. Their arguments rely, in part, on a KMT approximation which was already used by Glynn 
and Whitt [GWj in a related queueing model. 

Here, we closely follow |BM| and take advantage of the representation ([2]) of the spectrum of a 
matrix in the GUE. Using Brownian scaling in those Brownian functionals, we can mix together n 
and m in the corresponding limit Q (see (|14p below). Then, exhibiting an expression similar to 
(12), but with dependent Bernoulli random variables, for the shape of the Young tableau (see P7)) ). 
we show via a Gaussian approximation that the Bernoulli functionals stay close to the Brownian 
functionals (see P^ ). so as to share the same asymptotics. 

Since we apply a Gaussian approximation to Bernoulli random variables with a strong integra- 
bility property, the strong approximation can be made more precise than in [BM]. However, this is 
not enough to obtain the fluctuations for to of larger order. Actually the Gaussian approximation 
is responsible for the condition to — o(n^/^'^(log n)~^/^) , which falls short of the corresponding 
polynomial order condition to = 0(71"^/^) obtained in [BM ' . However, in contrast to [BM], the 
stronger integrability property of the Bernoulli random variables and the stronger condition on 
TO are required to control the constants appearing in the Gaussian approximation applied to a 
triangular scheme of different distributions. 

Using Skorokhod embedding, Baik and Suidan |BS| derived, independently of ^BMj, similar 
convergence results (see Theorem 2 in |BS) ). under the condition m — a 

(n3/i4). See also (S^ for 

related results (under m = o(n^/^)) in percolation models using functional methods in the CLT. 

Finally, note that |BMllBS[lSu| deal with percolation models with i.i.d. random variables under 
enough polynomial integrability. In our setting, the lengths of the rows of the Young tableaux 



4 



associated to random words are expressed in terms of dependent (exchangeable in the uniform 
case) BernouUi random variables. We are thus working with much more specific random variables, 
but without complete independence. 

The paper is organized as follows: Section [5] is devoted to the proof of Theorem |31 while we 
sketch the changes needed to prove Theorem [6] in Section |3l We conclude in Section |4] with some 
remarks on the convergence of whole shape of Young tableaux when the draws are non-uniform. 

2 Proof of Theorem S] 

Brownian scaling. Let (_B'(s))s>o, I < I < m, he independent standard Brownian motions. 

For s > 0, m > 1 and fc > 1, let 

fc m — k+j 

Lk{s,m)= sup E iB'{tj.i)~B'{t,,i^i)), (12) 

where /fe_m(s) = {st,t € Ik,m}- For fc = 1, Li{s,m) coincides with the Brownian percolation model 
used in |BM| ; see also |GW| for a related queueing model. For s — 1, 0^^((ifc(l, m))i<fc<m) has 
the same law as the spectrum of an m x in GUE matrix; see |Do| and |HX| . 
Since (ii(-, ni), . . . , Lm,{-, m)) is a continuous function of B^, . . . , B™, which are independent, 
Brownian scaling entails that 

(Li(s,m), . . . ,Lm{s,m)) = \/s(ii(l, m), . . . , im(l, m)). (13) 
Plugging ([T3ll into (j4]) yields, as m — > +oo, 

/ Lk{n,m) — 2ky/nm 

Combinatorics. Let 

r 1 ifxr = a, 

|_ otherwise, 

be Bernoulli random variables with parameter P(X™ — ctj) — l/m and variance (t^„ — (l/r7i)(l — 
1 /m) . For a fixed 1 < j < m, the X™ s are independent and identically distributed, while for j ^ j' , 
{X™j, . . . , X^j) and (X^j, , . . . , X'^j,) are identically distributed, but no longer independent. 

Again, recall that the length of the first row of the Young tableau of a random word is the 
length of the longest (weakly) increasing subsequence of X™, . . . , X™. 

Let S^'-' = J2'i=i-^i.'j number of occurrences of aj among (X™)i<i<fe. An increasing 

subsequence of {Xf^)i<i<k consists of successive blocks, each one made of an identical letter, with 
the sequence of letters representing each block being strictly increasing. Since, for 1 < fc < ^ < n 
the number of occurrences of aj among {Xl^)k<i<i is S'™'-' — S'™'-', it follows that 

V,{n,m)^ max [(S^^^^ - S^^') + {S^f - S^^^') + ■ ■ ■ + [Si:^^"^ ~ SPS,)], (15) 

U — to <;i 1 5: " ' 

<U-l<im=n 

with the convention that S*™'^ = 0. More involved combinatorial arguments yield the following 
expression for Vk{n,m) (see Theorem 5.1 in |HL3| ): 

k m — k+j 

y.(n,m)=^^max V (^^-^K-J^ (16) 

where 

Jr,ra{n) = {k (fcj_; : 1 < j < r, < / < m) : fcjj-l = 0, kj^rn-r+j = 11,1 < j < r, 



1 <k<r 



(14) 
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fcj./_i < kj^i, 1 < j < r,l < I < m ~ 1; kjj < kj^u, 2<j<r,l<l<m — 1^. 
For t G /r,m("-)i set [t] — {[tj^i] : 1 < J < J^, < Z < m) € Jr,m{n) and thus 

A: m — k+j 



14Km)= sup V ^ (^j"^ ^^j^, 



(17) 



which is to be compared with (|12p for Brownian functionals. 

Centering and reducing. Let X™ = (-'^i" — l/w)/""™ and S*™'' = -^^il: ^^id, replacing 

X™ by , similarly define Vfc(ri, m). Clearly, Vfc(r7,, m) = (7mVfc(n,m) +kn/m, hence, 

Vkin.m) — kn/m ~ 2kJn n/-, 

— ^ — X m ' 

\jn 

crmVk{n,m) -2ky/n 2/3 
= = X m ' 

_ Vfc(»,m) - 2k^a:^ 2/3X 
— -= X yaTyjn ) 

Vk{n, rn) — 2kJnm + 2kJn{GZ^ — w}!'^') ,1/9 n 
= 1/2 ^1/6 X \rn I cr„). 

Note that cr^^ — m^/^ ^ \/^/rn,, and that m^^^m^^^am ^ rn}^^, and so the limit under study is 
the same as that of _ 

Vk{n,m) ~ 2ky/nm 

^1/2^-1/6 ■ y^^) 

Bound. Next, and as in |BM| . we bound the difference between Vk{n,m) and Lk{n,m). This 
bound holds true for any Brownian motions (_B™'-')t>o but it will only be correctly controlled for 
a special choice of the Brownian motions and for copies of the random variables XJ" given by a 
coupling (using a strong approximation result, see Proposition [7] below) . 

Vkin,m) - Lfc(n, m)| 

k m — k+j k m — k+j 

k m — k+j 



< sup 

te/fc,™(n) 



sup 

te/fc,™(n) 



sup 

t6/fc,™(i: 



A; rn — k+j 

E E (^S"^f:::L])-E E {b%,)-b%,^,)) 

j=i i=j 



k m — k+j 



k m~k+j 



E E (^S-^'(^.'))-E E 
j=i 1=3 j=i 

fc m — k+j 

E E ((5";!]-5'(M)) + (^'(M)-5'feO) 



m-k+j 

^ EE (ic!]-^'(M)i + i^'(M)-^'feo 



t6/fc,m(n) 
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< 2k 



1=1 



V n 



(19) 



where we set 



y"-' = max IS': 
i<i< 



m,/ 



B\i)\ and W^^ = sup \B\s) - B\t)\. 

0<s,t<n 
\s-t\<l 



Gaussian approximation. From now on, we assmne that for each n and I E [l,»Ti] (recall 
that m — m{n)), the random variables X™^, 1 < « < and the Brownian motion (i?'(s))sg[o.„+i] 
appearing in F^"'' and (rewritten as {B"^'^ (s)) s£io,n+i]) , are given by the following result, which 
is a compilation of strong approximation results of Komlos, Major, Tusnady and of Sakhanenko 
and for which we refer to [Ll] (Theorem 2.1, Corollary 3.2) and the references therein. In the 
sequel, we write B™'' and W,™'', instead of and Wj^, to insist on the dependence in m of the 
random variables given by the following proposition. 

Proposition 7 Let (X„)„>i be a sequence of i.i.d. random variables with common distribution 
F having finite exponential moments. Then, on a common probability space and for every N , 
one can construct a sequence (X„)i<„<jv having the same law as {Xn)i<n<N , o,nd independent 
Gaussian variables {Yn)Kn<N having the same expectations and variances as {Xn)i<n<N , such 
that for every x > 0: 



max 

l<fc<A' 



fe 



i=i j=i 



>a; < {l + C2{F)N^/^)eM-ci{F)x), 



where Ci{F) and C2{F) are positive constants (depending on F). Moreover Ci{F) ~ c^\{F) and 
C2{F) = X{F)Ya,T(Xiy/^ , where C3 is an absolute constant and \{F) is given by 

A(F) ==:sup{A > ; XE[\Xi -E[Xi]\^ exp{X\Xi ~E[Xi]\)] <E[\Xi -E[Xi]\^]} . 

The strong approximation entails the following bound for the tail of F^"'': 

P(y™^' > x) < (1 + C2(m)ni/2) exp(-ci(m)a;), (20) 

where ci(m) = C3A(X{"i) and C2(m) = A(X{'\) Var(X{'\)i/2. Observe that A(X{"i) = amX{X^i - 
E[Xi"i]) and note that' A(Xi™i) e [2"!, 2]. Indeed, for A > 2, 



E[\X^^ - E[X{"i] 



while, since |X™ - E[X{"J| < 1, 

'1, 



< 
< 



m 
1 

TO 
1 



1 \ A 



m 
1 



< AE[|xn-E[X™]pexp(A|X"\ 



Xn-E[Xn]|3exp -E[Xr:i]|)] < -exp (^-)e[\X'^\-E[X[^\]\'] < E[\X'^\^E[X^,] 



Thus, ci(m) and C2(to) behave like l/^/m. Also, note that the bound in ()20p is non-trivial for 
X > o„ := log(l + C2(TO)n^/^)/ci(m). 
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Approximating sets. Let A" = {max;<,„ yj^^' > a„}, for some a„ — Cci(m) ^(logn)^) 
a„ where C is some finite constant. We have 

V{A^) = P(|j{rr'>a„} 
5] P(i;r'' > an) 



Km 

< ' 

< me 
^ \/mne 
= Jmne 



;^'=i(™)'^"(l + C2(m)ni/2) 



Let ^2 = {maxi<;<TO W™"' > &„}, for 6„ = logn. Standard estimates (including reflection 
principle, Brownian scaling and Gaussian tail estimates) lead to 

p(A^) = p(U{w^r'>M) 

/<rn 

< mP(W^''i>6„) 



mP( sup iB^'^-^r'^l > fen 

|s-t|<l 



However, 



sup < sup sup IB^'i-B,"'^! 

0<s,t<ri 0<i<n-2 i<s,t<i+2 

|s-t|<l 

sup I sup B™'^ — inf B 

0<i<ri-2 Vj<t<i+2 ■i<s<i+2 



and so 

P(A^') < mP f sup f sup Bj"'^ - inf B,"'M > 6, 

Vo<i<n-2 Vi<t<i+2 4<s<J+2 / 

< mnP I sup Sr-^ - inf S"'^ > 6„ | 

Vte[o,2] -6 [0,2] j 

< mn ( P ( sup B™'^ > b„/2 ) + P ( sup > 6„/2 ) ) 

< 4mncxp(-6fj/16) 0, n ^ +oo. (21) 

Final bound. From (fT4|) . the approximation of (Vfe(n, TO))i</c<r by {Lk{n,m))i<k<r will 
imply the theorem if 



for some 



^ |T4(n,TO) - Lfc(n, m) > c„ j -H- 0, n +oo, (22) 
c„ = o(ni/2TO-i/6)^ (23) 
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Since lim. 



n— 



(A'/) + P(yl^)) = 0, it is enough to prove that 



But 



Finally, 



lim vi\y\Vk{n,m)-Lk{n,m) > cA n [A'^y n {A'^) 

I — \ I ' ^ I 



0. 



E 



^ Vfe (n, m) - Lfe (n, m) 



.k=l 



k=l 

< 2r^m 



< 2r^m 

< 2r^m 

< 2r^m 

< 2r'^m 

< 2r^m 

< 2r'^m 



Err 



m, 1 



a, 



E[Kr'' - an K„<Y,T'^<a„] + «n + ^n) 



ci(m) 



1 



+ fin + &n 



^ci(to) 

< 2^2^3/2 p(l + log(l + C2(m)nV^)) 

~ C3 



. fc=i 



< 



2r2m3/2 /2(l + log(l + C2(m)ni/2)) 



C3 



logri 



O 



23/2 



(24) 



(25) 



A choice of c„ ensuring that the bound in (|25p goes to zero as n +oo and also compatible with 
(1^5)1 is possible when m'^/^iogn = o(n^/2yj^-i/6^^ ^j^g^^ jg^ when m = o(n''/^°(log n)"^/^). Finally, 
([22| and ((24|) hold true, completing the proof of Theorem |4l □ 

Remark 8 

• In the above proof, the condition m — o(n3/^°(logn)~'^/^) is needed only once, to ensure the 
compatibility of ([23l) with the bound ([25]) . However, this is essential to make the Gaussian 
approximation work. 

• When m = [n°], the growth condition m = o(n'^/^"(log n)^^/^) can be rewritten as a < 3/10 
and this growth condition remains true, in particular, when m is of subpolynomial order. 
The condition a < 3/10 is stronger than its counterpart a < 3/7 in |BM| and this seems to 
be due to the fact that we work with a triangular array of random variables. 

• For the top line of the tableau, our result falls short of a result of Johansson in [Jo], which 
asserts the convergence of Vi(n,n°) (properly scaled and normalized) toward the Tracy- 
Widom distribution. More precisely, setting a„ <C &n for a„ = o(6„), Theorem 1.7 in [Jo] 



9 



actually gives, in our notation, for <C m, 



for (logn)^/^ ^ 7n <C 



and, for ^/njm I, 



Vi (n, to) — rt/m — 

^ Ptw, 



Ftw, 





ni/6 






to) — n/TO — 






j^l/2j^-2/3 






to) — n/TO — 


2Vri 



(l + 02/3ni/6 

In the middle limit above, [Jo, Th. 1.7] requires (Xognf'/^ — o{m), while we do not require 
a lower bound condition on to. Besides, our Theorem 3] applies to the shape of the whole 
Young tableau. 



3 Proof of Theorem M 

In this section, we sketch the changes needed in the previous arguments in order to prove The- 
orem [6l Note that in the uniform setting, the representation (|16p for Vk(n,m) is a maximum 
taken over the most probable letters. This is trivially true since, in this case, all the letters have 
the same probability. But this property, which appears to be fundamental when we center and 
normalize the X^^j, is no longer true in the non-uniform setting. However, we shall approximate 
Vi(n, m) below by a random variable V({n,m) defined as a maximum taken over only the most 
probable letters, as in (|16p ; see (pS)) . Part of the remaining work is then to show that we can 
suitably control this approximation, and this is done in Lemma [HI This control is at the root of 
the extra condition ([TU)l in Theorem [SI 

Let us revise our notation for the non-uniform setting. In this section, X™, I < i < n, are 
independently and identically distributed with P(X{" = aj) = p™. Set p™^^ = maxi<j<m^'™ 
and J(to) = {j : p™ = Kmrr} = {ji, ■ • ■ , ifc(m)}, with k{m) = card (J(to)), and also set 
c^m = -Pmaj.(l ' PT.'iax) ■ Finally, note that since k{m{n))p'^^^ < 1 and k{m{n)) +oo, it fol- 
lows that plnax — >■ as ri — > +cxi. 

Brownian scaling. Let {B\s))s>q, 1 < I < k{m), be independent standard Brownian mo- 
tions. For s > 0, TO > 1 and fc > 1, let 

Li(s,fc(TO))= sup ^ (B'(tO-S'(i,_i)), (26) 
te/fc(m)(s) 

where Ik{m){s) = {t : < ti < ■ ■ ■ < < ti < ■ ■ ■ < tk[m) = s}. Recall that ii(l, k{m)) has the 
same law as the largest eigenvalue of a k{m) x k{m) GUE matrix (see ([2]), ([2]), (dl) and Remark [3] 
for A; = 1) and so 

k^/^{L{l,k) - 2Vk) =^ Ftw- 
By Brownian scaling, Li{s,m) = -ysLi(l,m), so that when n +cx). 



Li{n,k{m{n))) - 2y/nk{m{ri))_ ^ ^^^^ 



ni/2A:(TO(n))-i/6 
Combinatorics revisited. Let 

, 1, when Xr = a 



^'-^ 1 0, otherwise, 
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be Bernoulli random variables with parameter P(X™ = aj) — and variance (cr™)^ = — 
p™). For a fixed 1 < j < m, the X™jS are independent and identically distributed. Since the 
expression p5p has a purely combinatorial nature, we still have 



o=;o<ii<- 



with the convention that J2i=i - i+i ^i"} = ^ whenever = ^j. 

In fact, for most draws, the maximum in Vi is attained on the sums X]jgj(m) i+i -^i^j 

corresponding to the most probable letters, that is, letting 

<;,Ti-l<U=n i=lj=ij_i+l 
— for j^J{m) 

we have, with large probability, Vi(n,m) = VY(n,TO). However, it is not always true that 
Vi(n, to) = ^/(n, to), for instance, in the case where the n letters drawn are letters with as- 
sociated probability strictly less than p^ax^ V{{n,m) — while there is an / = {lj)j=o,...,m 
with = Iq < h < ■ ■ ■ < Im^i < Im — n such that X^JLi X]i=i > 0' ensuring that 

Vi(n, to) > 0. In the sequel, we prove Theorem [5] by first showing that the statement of the 
theorem is true for ¥((71,111,) instead of Vi(n, to) and then by controlling the error made when 
V{{n, to) is replaced by Vi{n, to). 

Centering and reducing. Let X™ — [X''^^ — Pj")/"']" be the corresponding centered and 
normalized scaled Bernoulli random variables and let S'™'"' = X)i=i -^Ty Also, let 

F/(n,TO) = ^^^^^^max i E 



max y (5;"'^' -5, 

0=io<iji<--- I ^ 

<l <l — u \iGJ(m) 

-'jfc(n.)-l -'3fc(m) ^ 



fc(m(n)) 

sun f S™'-''' - S'™'-'* 



te/fc(„(„))(n) 5^;^ 

which is to be compared to (pS)) . Since V/l"-' ~ ''^Pmax — '^mV{{n, m), we have 



1/6 ^1 ('^^ - "Pma:. " 2^nfc(TO)gg, 



A:(to) 

= fc(TO' 

Since cr,„ - V^max and 



x/g V^i(n,TO) - 2^nk{m) 



2 



max ' 
2 



WkMiP^ax) 

7nax 



max 
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it remains to show that 



for which we shall use (l27l). 



Ftw^ 



(29) 



Sketch of proof of ([29ll . Roughly speaking, the proof of ([29|) follows along the same lines 
as the corresponding proof of the convergence of ([T8)) . changing only m to k{m). We show that if 
k{ra{n)) = o(n'^/^'^(log then for some Brownian motions given via strong approximation, 
we have 

fc(m(n)) 

y/(n,m)-Li(n,fc(m(n)))| < {Yn^' + ^T')^ 

1=1 

where 

y^'' = max |S'™''-B'"^'(i)| and VF^'' = sup - 

1<«<" 0<s,t<n 

|s-t|<l 

Indeed, setting A" = {max;<fe(TO(„)) 5^,7*'' > '^nli for some a„ = 0(ci(fc(m(n)))~^(logn)^) > 
a„ := log(l + C2(fc(m(n)))ni/2)/ci(fc(m(n))), and setting = {maxi<;<fc(„(„)) VF™-' > 6„}, for 
some hn = O(logn), we show that 

¥{Al) 0, P(A'2') ^ 0, when n ^ +oo. 

From ()27p . the approximation of fc(rn(n))) by fc(m(n)))) will imply the theorem if 



' Vl{n, k{m{n))) - Li{n, k{m{n))) 



> Cn 



0, n — > +00, 



for some 



c„ =o(ni/2fc(m(n))-i/6). 
Since hm„^+oo (F(A'/) + P(yl'2')) = and 



(30) 
(31) 



{\v;{n, k{m{n))) - ii(n, fc(TO(n)))| > c„} n (A^')^ n (A'2') 
2A;(TO(n))3/2 /2(1 + log(l + C2{k{m{n)))n^''^)) 



< 



C3 



logn 



(32) 



a choice of c„, ensuring that the bound in ([5^ goes to zero and is compatible with (PT|) . is possible 
since fc(m(n)) = o(n^/^''(logri)~'^/^). This proves ((29)) and thus the statement ([TT|) of Theorem|6l 
but for Vl{n,m) instead oiVi{n,m). 

Control of the error Vi{n,ra) — ^/(n, m). Clearly Vi(n, m) — ^^/(ri, m) > and is, in fact, 
zero with a large probability, so that we expect 'E,\Vi{n,m) — V-l{n,m)\ to be small. Actually, we 
show the following: 



Lemma 9 For some absolute constant C > 0, we have 

E[|Fi(n,™)-F/(n,m)|] <Cnp™„ 
where stands for the second largest probability for the letters of Am- 
The conclusion in (jlip holds true when 



(33) 



lim E[|Vi(n,TO) - Vl{n,m)\] x 



k{m{n)y^^ 
k{m{n))p"y}-aJ n ^ 



= 0. 



(34) 
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However, with the help of (|33l) . the conclusion in f34)) is then valid when lini„_j.+oo ^ mcnjj^ya 

and, since fc(m(n)) = o(n^/^"(log this will follow from (fTn|) . 

It remains to prove Lemma [HI that is, to give an explicit bound on E[|Vi(n,m) — V({n,m)\]. 
To do so, rewrite Vi{n,m) = max;g/(^) Z[l) and V-({n,m) = max;g/.(„j) Z{V) where /(m) = \l — 
ilj)i<j<ni ■ Ij-i < IjJo = 0, = n}, r{m) = {I e I{m) : = Ij for j ^ J(™)} and 

m 

Clearly, since I*{m) C I{m), we have V]'(n, m) < Vi(n, m). Moreover, since the X™- are Bernoulli 
random variables with parameter and since the XiS are independent, we have Yj{l) ^ B{lj — 
lj^i,Pj) and Eiej(m) ^i(0 ^ ^(EiG.7(m) " 'j-i:Pmax)> ^hcrc B{n,p) stands for the binomial 
distribution with parameters n and p. 

If / e /*(m), Z(0 = E.eJM^GG) ~ ^(",P™aJ, since, in this case n = ^,"1^. " ^.-i) = 
SjeJ(m)(^i ^ ^i-i)- If ^ ^ we rewrite Z{1) as 

where Z e I*{m) and i?(Z) is an error term. Indeed, let J; = {j ^ J(m) : Ij^i < Ij} and for j G Ji, 
define 

max Aj, if A J 7^ 0, 
min , otherwise, 

where Aj ~ {k E Jim) : fc < j} and where Bj = {fc G J(to) : fc > j}. Now, 



i.,3 



z{i) = E E ^^S + E E 

= E E ^i;} + E E ^"^(.) (35) 
h 

+ E E (^"}-^r«o-))- (36) 

Define / G I*{m) by /j = if j ^ J(m) and Zj = h-i for j G J{fn), where k = min{/ > j : I € 
J(rn)}, with the conventions that min0 — m + 1 and that Ij^-i — for jg = min J(m). We then 
have 

E E ^™ + E E ^TM,) = m- 

jGJ{m) jeJii=lj-i+l 

Let a™ := X™ — X[^j^^^ be the random variables taking the values —1, and +1 with respective 
probabilities p™^^, 1— p™^^— andp™. Independently, let be Bernoulli random variables with 
parameter = (p^„d-pf )/(l-p™a:r-P") G (0, 1), where p™^ = max(p™ < p™^^ : 1 < J < m), 
and define 

P^. =: <^ 0, a™ = and e^'^- = 0, 

[ +1, a™ = +1 or aj^'^- = and e™ = 1. 

Note that P(/3™- = +1) = Pand and that a™ < /3™ , so that 



i?(0<i?(0 = E E ^i^y 

j&Jl i=lj-i+l 
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Since Z{1) < Z{1) + R{1), we have 



max Z{1) < max Z(l) + max R{1) 

lel{m) lel{m) iGlim) 

< max Z(l) + max R(l). 

lel*{m) lel{m) 

Next, observe that for I e I*{m), R{1) = 0. However, since the event {R{1) < 0, VZ /*(m)} is 
non-neghgible, we cannot change maxi^K^^-f R{1) into max;^/.^^) R{1). We obtain 

< max Z{1) - max Z{1) < max R{1). 

lel{m) lel'irn) ie/(m) 

The random variable R{1) is the sum of X^^g i-*-'^- random variables so that max^g/^^) R{1) 

is distributed according to ^niaxi<fe<„ ^^^^ , where (/3J")i are i.i.d. with 

p(^r = -i)=p;^ax, p(^r = o) = i-Cax-Pw, p(^r = +i)=P^„rf- (37) 



We are now interested in bounding E 



(^maxi<fc<„ J2i=i l^r) 



Let (e™)i be i.i.d. Bernoulli random variables with parameter p^g^^ '^PTnd ^^"^ ^^^i independently, 
{Yl^)i be z.i.d. Rademacher random variables with parameter P2ndKP2nd ~^ Pmax) i^-e-, V{Yf^ = 
1) = 1 - ¥{Yr = -1) = PTnd/iPZd+P'^^a.))- Then ^™ and efl^™ have the same distribution 
and we have 



E 



max 

l<fc<n 



i=l 



E 



max Veri^" 



i<fc<?i 







= E 


E 



V i=l / 



Gn 



where Qn = cri^T" '■ ^ < i <n). However, since (e™)i is independent of we have 



E 



max 

l<k<n 



Qn 



max 

i<k<e 



E^" 



where £ = X;r=i has a B{n,p^„^ +P^nd) distribution. But 



E 



max 

l<k<t 



i=l 




max > ¥/ 



> k 



+ 00 J 

E(i-i^(,^«^,E^r<fc)) 

fc=0 ~ i=l 

E(i-^(,^sE^r^^)) 



fe=0 



i=l 



where = X^fe=o^£,fe and u^,fe = P(inaxi<j<£ ^^^^ Y^™ < A;). With the latest notation, we are 
now investigating 7„ = E[^ - JTg]. For simphcity, in the sequel, we set p*,™ := ■p^^J (P^nd+Pmax) 

and 9*jrn 1 P*,m- 

The elements of the sequence (w^,fc)i<fe<£-i satisfy the following induction relations: 

Ue,k = q*,mUe-l,k+l + P*,mUi-l^k-l, k>l, Uifi = g*,mW^-l,H 

and ue^k = 1 for k > £. From these, we derive Ui = 2q^:^m — <!'*,mit^-i,o + Ue-i and, since 



14 



In order to compute Wfc,o, we introduce the hitting time r™ = min (jt > 1 '. ~ 

of the random walk (J2i<j We then have 



"(rr < k) 



max 

i<k 



Y^YJ- >l)=l- v{rna^Y.Yp < o) = 1 



Uk,0 



so that Eti "'c.o = Eti P(^r > fc + 1) = EL2 P(^r > fc) = -1 + ELi P(^r > fc) and 

e 

Ue = 2%,„-g,,„^P(Tr > fc) 
+00 
1=1 



Next, 



and we have 



2%,m - g*,mE[T{" A 



E 



max 

i<k<i 



i=i 



£(l-2g,,„0+g,,™E[r™A£|g„] 



7„ := E[£-Ue] 

= E[£{1 - 2q,^„,) + g*,™E[Tr A ^|g„]] 
< (1 - 2q,^„, + q,,„,)E[£] 



This completes the proof of Lemma [S] 



□ 



4 Concluding remarks 

A natural question to consider next would be to derive a result similar to Theorem [5] for non- 
uniformly distributed letters. The special case of the longest increasing subsequence (i.e., r = 1) 
is dealt with in Theorem [SI Let us investigate what happens for the whole shape of the Young 
tableau. 

First, let us slightly expand our notation. In this section, X™, \ < i < n, are independently 
and identically distributed with P(X™ — aj) — P™- In order to simplify the notation, we assume 
(without loss of generality) that the ordered letters a™ < • • • < have, moreover, non- increasing 
probabilities (i.e., p™ > > • • • > P^)- Let d™ = card {j : = p™} be the multiplicity of 
p™ and let m™ = max{z : p™ > p™} be the number of letters (strictly) more probable than a"\ 
Let Jr{m) = {i : p™ = p™} = {rrir + 1, . . . , + d^} be the indices of the letters with the same 
probability p™. We recover our previous notation, r — I, with k(rn) — c?™ and J(m) = Ji(m). 
Since the expression (1161) has a purely combinatorial nature, it still holds true that 

r m — r+j kj i 

E E E 

ii=i i=j i=fci, 1-1+1 



Vr(n,m)= max Xf 



Let i/^ = Y!1=\VT- Note that, from Theorem 5.2 in [HL3], when TO is fixed and n — ^ +cx3, we 
have for each 1 < r < to that 

14 (n, m) - z/™n ^^ ^ 



l<fe<r 
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where the hmit is given in Section 6 of [HL3] by = Z(m, r) + VPf ^r-m;",d^" , with Z{m, r) - 
AA(0, vT') for < = (1 - v'^^i^) + (p;"(r - m;"))^, and 

(r (m-r+j) 

for 

-fr,m = {t = (tj,/, 1 < j < r, < / < to) : tj j-i = 0, tj^rn-r+] = 1, 1 < j < 

< < j < < ^ <m~l,tj^i < tj_i,,,2 < J < r,l < Z < TO - l}. 

Note that -Dr-.m is a natural generahzation of the Brownian functional Li(s,fc) used in Section [3] 
(see also, in a queuing context, |GW| and |Baj ). In particular, Dr^n is equal in distribution to 
the sum of the r largest eigenvalues of an to x to matrix from the GUE and Theorem [2] can be 
rewritten as 

{m^/''{Dk,m-kV^))^^^^^^Fr&-\ TO ^+00. (39) 

Arguing as in the previous sections, we would like to derive the fluctuations of {Vk{n^m))i<:k<r 
with respect to n and m simultaneously from (jSHJ and p9p . However, in the non- uniform case, this 
is not that transparent since, for each r > 1, the behavior of mj" and of c?™, with respect to m, is 
not that clear cut . In particular, r — to™ may not be stationary and (j39p can no longer be used for 
Dr-mi^.d"^ ■ Besides, the random fluctuations of ^/p^Dr-m^^d"^ in are of order (p™)i/2(d™)i/6^ 
which, in general, does not dominate those of Z{m, r) ^ A/'(0, wj"). Thus, for general non-uniform 
alphabets, we cannot infer which part of the law of — Z{m,r) + v^p5?*-Dr-m™,d™ will drive 
the fluctuations. We can imagine that, taking simultaneous limits in n and to, the fluctuations 
of Vr(n, m(n)), properly centered and normalized, are either Gaussian, either driven by (as in 
Theorem 2]) or given by an interpolation between these distributions, depending on the alphabets 
considered. 
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